Statistical Hypothesis Inference Testing

^z10th January 2023 at 7:40pm

Statistician Jacob Cohen wrote a charming essay in 1994, "The Earth Is Round ( p < .05)", about the problems with "95% confidence" testing and its misapplication. A 2011 Psychology Today blog post by Joachim Kruger, "What Cohen Meant", explains:

Jacob Cohen (1923 - 1998) was a pioneer of psychological statistics. He taught us about effect sizes, power analysis, and multivariate regression, among many other things. I have always admired his ability to combine technical rigor with good judgment. During the last decade of his life, Cohen published two particularly insightful papers in the American Psychologist. Both had to do with Null Hypothesis Significance Testing (NHST). In "Things I have learned (so far)," Cohen (1990) questioned the idea of living by p values alone and suggested the researchers avail themselves of the multiple tools they can find in the statistical box. In "The earth is round (p < .05)," Cohen (1994) outed himself as a Bayesian. He made clear what many were already dimly aware of, namely that what you want from statistical testing is the probability that an hypothesis is true given the evidence, whereas what you get from the standard tests is the probability of the evidence assuming that the (null) hypothesis is true. How to get from the latter out to the former is a matter of ongoing debate ...

Cohen's original essay is full of hilarious asides:

"Like many men my age, I mostly grouse. My harangue today is on testing for statistical significance ..."
"... we, as teachers, consultants, authors, and otherwise perpetrators of quantitative methods, are responsible for the ritualization of null hypothesis significance testing (NHST; I resisted the temptation to call it statistical hypothesis inference testing) to the point of meaninglessness and beyond. ..."
"... For example, Meehl described NHST as 'a potent but sterile intellectual rake who leaves in his merry path a long train of ravished maidens but no viable scientific offspring' ..."

Jacob Cohen also offers examples of misapplied deductive reasoning, leading up to the classic:

If a person is an American then he is probably not a member of Congress (TRUE, RIGHT?)
This person is a member of Congress.
Therefore, he is probably not an American.

Cohen points out that this is formally the same as the fallacy that, if the null hypothesis is true, a given statistically "significant" result probably wouldn't occur — the root of much bad science. As Mark Reid puts it in his 2009 blog post:

Repeat after me: "the p-value is NOT the probability the null hypothesis is true given the observed data".

And as Reid then quotes Cohen:

What's wrong with NHST? Well, among many other things, it does not tell us what we want to know, and we so much want to know what we want to know that, out of desperation, we nevertheless believe that it does! What we want to know is "Given these data, what is the probability that H₀ is true?" But as most of us know, what it tells us is "Given that H₀ is true, what is the probability of these (or more extreme) data?" These are not the same ...

(cf. Medicine and Statistics (2010-11-13), Introduction to Bayesian Statistics (2010-11-20), ...) - ^z - 2013-12-01